Compose-Reduce Parsing
نویسندگان
چکیده
Two new parsing algorithms for context-free phrase structure grammars are presented which perform a bounded amount of processing per word per analysis path, independently of sentence length. They are thus capable of parsing in real-time in a parallel implementation which forks processors in response to non-determinis-tic choice points. 0. INTRODUCTION The work reported here grew out of our attempt to improve on the o (n 2) performance of the SIMD parallel parser described in (Thompson 1991). Rather than start with a commitment to a specific SIMD architecture, as that work had, we agreed that the best place to start was with a more abstract architecture-independent consideration of the CF-PSG parsing problem-given arbitrary resources, what algorithms could one envisage which could recognise and/or parse atomic category phrase-structure grammars in o (n) ? In the end, two quite different approaches emerged. One took as its starting point non-deterministic shift-reduce parsing, and sought to achieve linear (indeed real-time) complexity by performing a constant-time step per word of the input. The other took as its starting point tabular parsing (Earley, C KY), and sought to achieve linear complexity by performing a constant-time step for the identi-fication/construction of constituents of each length from 0 to n. The latter route has been widely canvassed, although to our knowledge has not yet been implemented-see (Nijholt 1989, 90) for extensive references. The former route, whereby real-time parsing is achieved by processor forking at non-deterministic choice points in an extended shill-reduce parser, is to our knowledge new. In this paper we present outlines of two such parsers, which we call compose-reduce parsers. L COMPOSE-Rk~nUCE PARSING Why couldn't a simple breadth-first chart parser achieve linear performance on an appropriate parallel system? If you provided enough processors to immediately process all agenda entries as they were created, would not this give the desired result? No, because the processing of a single word might require many serialised 87
منابع مشابه
Batched Shift Reduce Parsing with Lists of Vectors on CUDA
Shift Reduce Parsing is a common algorithm used in compilers and natural language processing, and can be used to compose a sequence of fixed-length vectors into a single vector of equal length. Previous versions are implemented using predetermined computational graphs that trade excessive memory and computation to minimize transfers of memory from the device to the host. In this paper, I presen...
متن کاملAn improved joint model: POS tagging and dependency parsing
Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...
متن کاملResolving Coordinate Structures for Chinese Constituent Parsing
Coordinate structures are linguistic structures consisting of two or more conjuncts, which usually compose into larger constituent as a whole unit. However, the boundary of each conjunct is difficult to identify, which makes it difficult to parse the whole coordinate and larger structures. In labeled data, such as the Penn Chinese Tree Bank (CTB), coordinate structures are not labeled explicitl...
متن کاملA Parallel Augmented Context-Free Parsing System For Natural Language Analysis
Parsing efficiency is one of the important issues in building practical natural language processing systems. This paper proposes a design and an implementation of a parallel augmented context-free parsing system for natural language analysis. Natural language grammars are more than context-free, so that unification formalisms are adopted to enforce the linguistic constraints and to transfer the...
متن کاملPractical Dynamic Grammars for Dynamic Languages
Grammars for programming languages are traditionally specified statically. They are hard to compose and reuse due to ambiguities that inevitably arise. PetitParser combines ideas from scannerless parsing, parser combinators, parsing expression grammars and packrat parsers to model grammars and parsers as objects that can be reconfigured dynamically. Through examples and benchmarks we demonstrat...
متن کامل